Text Classification Using Stochastic Keyword Generation

نویسندگان

Cong Li

Ji-Rong Wen

Hang Li

چکیده

This paper considers improving the performance of text classification, when summaries of the texts, as well as the texts themselves, are available during learning. Summaries can be more accurately classified than texts, so the question is how to effectively use the summaries in learning. This paper proposes a new method for addressing the problem, using a technique referred to as ’stochastic keyword generation’ (SKG). In the proposed method, the SKG model is trained using the texts and their associated summaries. In classification, a text is first mapped, with SKG, into a vector of probability values, each of which corresponds to a keyword. Text classification is then conducted on the mapped vector. This method has been applied to email classification for an automated help desk. Experimental results indicate that the proposed method based on SKG significantly outperforms other methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Classification using Language-independent Pre-processing

A number of language-independent text pre-processing techniques, to support multi-class single-label text classification, are described and compared. A simple but effective statistical keyword identification approach is proposed, coupled with a number of phrase identification mechanisms. Experimental results are presented.

متن کامل

Statistical Identification of Key Phrases for Text Classification

Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyw...

متن کامل

Automatic Content-Based Categorization of Wikipedia Articles

Wikipedia’s article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse – using text classification methods for predicting the categories of Wikipedia articles – has attracted less attention so far. We propose to “return the favor” and use text classifiers to improve Wik...

متن کامل

Short Text Feature Enrichment Using Link Analysis on Topic-Keyword Graph

In this paper, we propose a novel feature enrichment method for short text classification based on the link analysis on topic-keyword graph. After topic modeling, we re-rank the keywords distribution extracted by biterm topic model (BTM) to make the topics more salient. Then a topic-keyword graph is constructed and link analysis is conducted. For complement, the K-L divergence is integrated wit...

متن کامل

Keyword Reduction for Text Categorization using Neighborhood Rough Sets

Keyword reduction is a technique that removes some less important keywords from the original dataset. Its aim is to decrease the training time of a learning machine and improve the performance of text categorization. Some researchers applied rough sets, which is a popular computational intelligent tool, to reduce keywords. However, classical rough sets model, which is usually adopted, can just ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Text Classification Using Stochastic Keyword Generation

نویسندگان

چکیده

منابع مشابه

Text Classification using Language-independent Pre-processing

Statistical Identification of Key Phrases for Text Classification

Automatic Content-Based Categorization of Wikipedia Articles

Short Text Feature Enrichment Using Link Analysis on Topic-Keyword Graph

Keyword Reduction for Text Categorization using Neighborhood Rough Sets

عنوان ژورنال:

اشتراک گذاری